Fast deliberation is related to unconditional behaviour in iterated Prisoners’ Dilemma experiments

People have different preferences for what they allocate for themselves and what they allocate to others in social dilemmas. These differences result from contextual reasons, intrinsic values, and social expectations. What is still an area of debate is whether these differences can be estimated from differences in each individual’s deliberation process. In this work, we analyse the participants’ reaction times in three different experiments of the Iterated Prisoner’s Dilemma with the Drift Diffusion Model, which links response times to the perceived difficulty of the decision task, the rate of accumulation of information (deliberation), and the intuitive attitudes towards the choices. The correlation between these results and the attitude of the participants towards the allocation of resources is then determined. We observe that individuals who allocated resources equally are correlated with more deliberation than highly cooperative or highly defective participants, who accumulate evidence more quickly to reach a decision. Also, the evidence collection is faster in fixed neighbour settings than in shuffled ones. Consequently, fast decisions do not distinguish cooperators from defectors in these experiments, but appear to separate those that are more reactive to the behaviour of others from those that act categorically.

Supplementary Table S1: Details concerning the datasets used in the current analysis. There is one pairwise experiments (PIPD) with two treatments, and two network experiments (Von Neumann and Moore neighbourhoods) with two treatments each.

Relative Allocation measure
In order to quantify the behavioural heterogeneity in preferences of cooperation from the data of the different IPD experiments, an adaptation of the Slider Measure of Murphy et al. [4] is constructed here. Murphy and colleagues propose to examine at least 6 items, called the primary slider items, wherein subjects have to decide how to allocate a resource between themselves and another person over a continuum of joint payoffs. Ahn et al. constructed a similar measure to analyse possible payoffs, the extent of greed and inequality aversion in participants of behavioural experiments [5]. As this slider information was not acquired from the participants in the PIPD, mNIPD or vnNIPD experiments, their heterogeneity in social motivations needs to be determined differently. We define here the Relative Allocation (RA) measure as the allocation of payoffs to self and others relative to the allocations one experiences from others. The assumption that is made here is that a person's cooperative or individualistic motivations captured by the RA • approach are correlated to behavioural decisions that a person takes in a given game context. Here, the context is defined by the actions and gains one observes of the others in previous rounds while playing the IPD. We are aware that IPD constitutes a strategic game, where not only a participants' values are displayed but also their strategic nature, as mentioned by Murphy and Ackerman [6]. Nonetheless, measuring the preferences and heterogeneity of motivations in a given context is worth considering, as the observed trajectory of decisions will be correlated with a person's values. For example, how some participants may choose to defect facing defection from their opponent in the previous round while others choose to "forgive", might be the result of their values and expectations [1].
To measure RA, we used the planned allocation for themselves (a sel f ), and for others (a other ) that subjects had when taking a decision, i.e. whether they cooperate or not, given the context c of the decision, which is defined here as the number of co-players that cooperated in the last round played. This way, if we visualize them in a Cartesian Plane as in Figure S1A, subjects end up situated at an angle from the origin of the plane. In pairwise experiments, the context c can be c = 1 or c = 0, but in network experiments it could go from c ∈ {0, 1, .., 4} in the vnNIPD and c ∈ {0, 1, .., 8} in the mNIPD experiments. The calculation of a sel f and a other is given by Equations 1 and 2, respectively. N is the number of partners playing with the subject in the game, so for pairwise experiments N = 2 and for the network experiments, vnNIPD and mNIPD, N = 4 and N = 8 respectively. The parameters R, S, T and P are the payoffs used in the different IPD experiments. To account for different payoff matrices and even negative ones, we normalized the payoffs in all matrices, preserving the proportion of the R, S, T and P parameters, but limiting the range from 0 to 1.
For example, in PIPD f , if a subject knows their partner cooperated in the previous round, they might choose to reciprocate, planning an allocation of 3 for each (a sel f = 3 and a other = 3), meaning that allocates the same for herself and her opponent, see the orange point in Figure S1A to see its position on the Cartesian plane. On the other hand, if they choose the defect, it means they planned an allocation of 4 for themselves and 0 for their partner (a sel f = 4 and a other = 0, see red point in Figure S1A).
A similar decision is taken in the network experiments, where a subject sees that in the previous round c people cooperated, so they have to decide their allocation given the context of the previous round. Then, for example, if in the previous round only 2 out of 8 participants cooperated (c = 2, N = 8), if a subject chooses cooperation, that would yield a sel f = 14; and a other = 74, see the green point in Figure S1A to see where it falls in the Cartesian plane.
We took the first 20 rounds of each experiment to develop the RA • metric, in order to have enough occurrences of each context and to be able to measure the preferences of cooperation of the participants. By taking the mean over these rounds for these two measures (represented byā other andā sel f ), we can visualize where subjects ended up in a Cartesian plane as shown in Figure S1, where their motivation can be determined by the angle from the origin (hence the RA • notation), as the Equation 3, which is equivalent to what Murphy et al. show [4].
This way, the higher RA • , near 90 • , the more cooperative this person remains while being confronted with defectors, consequently allocating more for others than for themselves. A behaviour thus similar to "All-C". A RA • near zero means that this person can allocate more for themselves than for others, i.e. playing defect when being confronted with cooperation. This behaviour can be considered "individualistic" and is similar to "All-D". Those around 45 • acted conditionally on their opponents' past behaviour, responding with defect or cooperate depending on the observed context, and their own intentions.
Supplementary Figure S1: Left: Visual representation of the Relative Allocation. The Relative Allocation (RA) is the ratio of how much each participant allocates for themselves (a sel f ) and for others (a others . In this example, the points represent three different scenarios, the green point where a subject allocates 74 for others and 14 for herself in mNIPD, the orange point one where she allocates 3 for each in PIPD and the red one where a subject allocates zero for others and 4 for herself in the PIPD. See Methods for more details on this calculation. Right: Visual representation of the DDM as a series of random walks over time. The coloured lines represent a random walk of deliberation between two options: cooperation and defection. The parameter a represents the width of the decision spectrum, while z represents the starting point of the deliberation process, if it crosses this threshold, a decision is taken in time t. The arrows represent the drift that attracts the subjects towards defection in the IPD, and the blue curves the distribution of time t that subjects took to make the decision. Supplementary Figure S2: Rationality measure vs. relative allocation per treatment. The rationality measure, developed by Gallotti et al. [7], measures how much each decision during the game is influences by the initial bias (z). Here, the scores were computed individually, each point represents a player, the lines indicate a significative Spearman correlation. It can be seen that except for PIPD f (upper right), all treatments present a negative correlation between RA • and their rationality measure. In other words, the more selfish the player was, the more they relied on deliberation, as opposed to players with high RA • that relied more on their intuition.